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Abstract — P2P systems are a great solution to the problem of 
distributing resources. The main issue of P2P networks is that 
searching and retrieving resources shared by peers is usually 
expensive and does not take into account similarities among 
peers. In this paper we present preliminary simulations of 
PROS A, a novel algorithm for P2P network structuring, inspired 
by social behaviours. Peers in PROSA self-organise in social 
groups of similar peers, called "semantic-groups", depending on 
the resources they are sharing. Such a network smoothly evolves 
to a small-world graph, where queries for resources are efficiently 
and effectively routed. 



I. Introduction 

In the last years social communities have been deeply 
studied not only by psychologists or sociologists, but also by 
computer scientists. The main point is that social communities 
seems to naturally possess really interesting characteristics that 
can be exploited in computer science. Studying collaboration 
communities, researchers have found an interesting structure 
that seems to arise whenever a network of relationships among 
entities is involved: the so called "small-world" graphs. A 
small-world graph is a graph which present a high clustering 
coefficient (i.e. similar peers usually link each other) and a 
relative small average path length (i.e. the average number of 
intermediates between two peers is small). 

The small-world property seems to be a characteristic of 
many human communities, such as mathematicians, actors, 
scientists. A small-world arises almost naturally whenever 
social contacts among people are involved: many researchers 
are trying to understand the reasons of this behaviour In this 
work we're not interested in answering this question. Our 
target is just to develop a P2P system using rules and concepts 
inspired by human behaviours and relationships dynamics. 

In a social network there are several kind of links among 
people, from simple acquaintance to friendship. Note that 
usually social links are not symmetric: all British people know 
who is the prime minister of UK, but the prime minister 
himself doesn't directly know all of them. We say that a person 
has an "acquaintance-link" to somebody else if he simply 
knows him. In real life it is really simple to gain acquaintance- 
links to anybody: a person met on the stairs and a taxi driver 
are examples of such links. 

Nevertheless our social life is mainly based upon "semantic- 
links". A semantic-link is more than a simple acquaintance 
link, since it requires not only to know a person, but also 



to share with him knowledge, culture, interests, job, hobbies 
or abilities. We have semantic-links to our parents, brothers, 
friends, colleagues, teachers and so on, because we share 
with them our home (parents), our interests (friends), our job 
(colleagues) and so on. Usually social life is heavily focused 
on relationships with our "semantic-links", since we spend 
most of our time talking, discussing, working, staying with 
friends, parents, colleagues and so on. 

For this reason we use semantic-links in order to solve every 
day problems: we ask our parents for a suggestion, we talk 
with friends about shared interests, we ask a colleague for help 
about finding a bug in a program and so on. 

Starting from these observations of social relationships 
dynamics, we defined [2] a P2P structure, named PROSA, 
in which similar peers build "semantic-links" to each other 

PROSA uses a self-organising distributed algorithm that 
dynamically links peers sharing similar knowledge and re- 
sources, putting them into high clustered and self-structured 
"semantic groups". Searching and retrieving resources in a 
semantic group is really fast and efficient, since peers into 
a group are strongly connected. As real social communities, 
PROSA naturally evolve to a small-world network (as reported 
below), that allows peers to retrieve resources in a fast and 
efficient way, also if the requested resources do not belong to 
the same semantic group of the requesting peer. 

In this paper we report some preliminary simulation results 
of PROSA. As showed in Section ITVl the distributed algorithm 
used to route query and to manage links among peers, allows 
to obtain a great percentage of successfully answered queries 
and a small average path length between peers. Comparisons 
with a simple flood strategy and with a "random-walk" are 
also reported. 

The paper is organised as follows: Section |ll] is a short sur- 
vey about current work in the field of P2P resource retrieval; 
in Section |lll] we discuss our proposal; in SectiorflVl we show 
simulation results and finally Section |V] presents a plan for 
future work. 

II. Related work 

PROSA is not the first attempt to organise a P2P network 
taking into account "semantic" proximity of shared resource in 
order to optimise query routing. Some recent works ([1][7]) 
proposed to organise a P2P network in semantic groups of 



"similar" peers, to facilitate resource search and retrieval based 
on semantic queries. In particular in SETS [1] the network is 
split in semantic areas by a super-peer which also maintains a 
table of groups centroids; a centroid represents the "topic" of 
a given area. The main drawback of SETS is the introduction 
of a network manager, which represents a single point of fault. 
In GES [7] peers maintains two sets of links to other peers: 
semantic-links and random-links. Queries for resources are 
first forwarded to a so-called "semantic-target", which is the 
first peer that can answer the query, and then flooded to this 
peer neighbours (the semantic group). 

PROSA is an early attempt to implement a bio-inspired 
Unk management algorithm into a pure P2P overlay network. 
We think it is really interesting to study real networks, such 
as social communities, in order to find new and effective 
algorithms for sharing, searching and distributing resources 
in P2P environments. 

III. PROSA 

PROSA is a P2P network based on acquaintance- and 
semantic-links, where peers join the network in a way similar 
to a "birth", then achieve more links to other peers accord- 
ing to the social model, i.e. by linking (semantically) with 
peers which have similar interests, culture, hobbies, works 
and so on, and maintaining a certain number of "random" 
acquaintances. In P2P networks the culture or knowledge of 
a peer is represented by the resources (documents) it shares 
with other peers. On the other hand, different types of "links" 
among peers simulate acquaintances and semantic-links. To 
implement such a model it is necessary to have: 

• A system to model knowledge, culture, interests etc... 

• A self-organising network management algorithm 

A. Modelling Knowledge 

In PROSA, knowledge (each resource) is modelled through 
Vector Space Model (VSM) [5] . In this approach each 
document is represented by a state-vector of (stemmed) terms 
called Document Vector (DV); each term in the vector is 
assigned a weight based on the relevance of the term itself 
inside the document. This weight is calculated using a modi- 
fied version of TF-IDF [4] schema, as follows: 

Wt,D = 1 + l0g(/t) 

where ft is the term frequency into the document. It has 
been proved [5] that this way of calculating relevance is a 
good approximation of TF-IDF ranking schema. The VSM 
representation of a document is necessary to calculate the 
relevance of a document with respect to a certain query. We 
model a query by means of a so-called Query Vector (QV), 
that is the VSM representation of the query itself. Since both 
documents and queries are represented by state-vectors, we 
define the relevance of a document (D) with respect to a given 
query (Q) as follows: 

r{D,Q)^ ^ Wt,D-Wt,Q (1) 
t£DnQ 



Using VSM we obtain also a compact description of a peer 
knowledge. This description is called "Peer- Vector" (PV), and 
is computed as follows: 

- For each document hosted by the peer, the frequencies of 
terms it contains are computed {Ft,D)- 

- Terms frequencies for different documents are summed 
together, obtaining overall frequency for each term: 

Ft = ^ Ft^D 
t 

- Then a weight is computed for each term, using: 

wt^P = 1 + log(Ft) 

- Finally all weights are put into a state-vector and the 
vector is normalised. 

The obtained PV is a sort of "snapshot" of the peer 
knowledge, since it contains information about the relevant 
terms of the documents it shares. 

The relevance of a peer (P) with respect to a given query 
(Q) is defined as follows: 

tGPnQ 

This relevance is used by the PROSA query routing algorithm. 
It is worth noting that a high relevance between a QV and a 
PV means that probably the given peer has documents that 
can match the query. 

B. Network Management algorithm 

As stated above, relationships among people are usually 
based on similarities in interests, culture, hobbies, knowledge 
and so on. And usually these kind of links evolve from simple 
"acquaintance-links" to what we called "semantic-links". 

To implement this behaviour three types of links have been 
introduced: 

- Acquaintance-Link (AL) 

- Temporary Semantic-Link (TSL) 

- FuU Semantic-Link (FSL) 

TSLs represent relationships based on a partial knowledge of 
a peer They are usually stronger than ALs and weaker than 
FSLs. 

Since usually relationships are not symmetric, it is necessary 
to specify what are the source peer (SP) and destination peer 
(DP) of a link. Figure [T] shows the representations for the three 
different types of links. 




Fig. 1. Link types 

Each peer into PROSA maintains a list of known peers, that 
we call Peer List (PL). This list contains all the links gained 




Fig. 2. A new node joining PROSA 

by a peer during his "life". It is similar to a personal phone 
book: when we meet a person we link to him with an AL. If 
we share interests, knowledge or anything else with him, the 
AL pointing to him smoothly becomes a semantic-link. It first 
evolves to a Temporary Semantic Link, and then to a Fully 
Semantic Link. 

1 ) Joining: The case of a node that wants to join an existing 
PROSA network is similar to the birth of a child. At the 
beginning of his life a child "knows" just a couple of people 
(his parents). A new peer which wants to join, just searches 
other peers (for example using broadcasting, or by selecting 
them from a list of peer that are supposed to be up, as in 
Freenet[3] or Gnutella) and adds some of them in his PL 
as Hals. These are ALs because a new peer doesn't know 
anything about its "relatives" until he doesn't make query to 
them for resources. This behaviour is quite easy to understand: 
when a baby comes to life he doesn't know anything about 
his parents. He doesn't know his father's job, neither that is 
mother is a biologist. The joining phase is represented in figure 
121 where "N" is the new peer; N chose some other peers (P) 
at random as initial ALs. 

2) Updating: In PROSA FSLs dynamics are strictly related 
to queries. When a user of PROSA requires a resource, he 
performs a query and specifies a certain number of results he 
wants to obtain. The relevance of the query with respect to the 
resources hosted by the user's peer is first evaluated, using 
equation [T] If none of the hosted resources has a sufficient 
relevance with respect to the query, the query has to be 
forwarded to other peers. The mechanism is quite simple: 

- A query message containing the QV, a (possible) unique 
QuerylD, the source address and the required number of 
results is built. 

- If the peer has neither FSLs nor TSL, i.e. it has just AL, 
the query message is forwarded to one link at random. 

- Otherwise, the peer computes the relevance between the 
query and each entry of his Peers-List. 

- It selects the Unk with a higher relevance, if it exists, and 
forwards the query message to it. 

When a peer receives a query forwarded by another peer, 
it first updates its PL. If the requesting peer is an unknown 
peer, a new TSL to that peer is added in the PL, and the QV 
becomes the corresponding Temporary Peer Vector (TPV). If 
the requesting peer is a TSL for the peer that receives the 
query, the corresponding TPV in the list is updated, adding 
the received QV and normalising the result. If the requesting 
peer is a FSL, its PV is in the PL yet, and no updates are 
necessary. 



After PL update, the relevance of the query and the peer 
resources is computed. There are three possible cases: 

- None of the hosted documents has a sufficient relevance. 
In this case the query is forwarded to another peer, using 
the same mechanism used by the forwarder peer The 
query message is not modified. 

- The peer has a certain number of relevant documents, but 
they are not enough to full-fill the request. In this case a 
response message is sent to the requester peer, specifying 
the number of matching documents and the corresponding 
relevance. The message query is forwarded to all the links 
in the PL whose relevance with the query is higher than 
a given threshold (semantic flooding). The number of 
matched resources is subtracted from the number of total 
requested documents before forwarding. 

- The peer has sufficient relevant documents to full-fill 
the request. In this case a result message is sent to the 
requesting peer and the query is no more forwarded. 




Fig. 3. Query forwarding: new TSL aiise 

This situation is showed in figure [3j where peer "N" 
forwards a query to one of his ALs randomly chosen, since it 
has neither TSLs nor FSLs. In our example the chosen peer is 
"PI". As soon as PI receives the QV, it automatically establish 
a TSL with N (see figure O and then it forwards the query if 
needed. 

When the requesting peer receives a response message 
it presents the results to the user If the user decides to 
download a certain resource from another peer, the requesting 
peer contacts the peer owning that resource and asks it for 
download. If download is accepted, the resource is sent to the 
requesting peer, together with the Peer Vector of the serving 
peer. This case is illustrated in figure H] where peer "N" 
received a response from peer "Pr" and decided to download 
the corresponding resource. Note that Pr established a TSL 
with N, because it received a QV from it, and N established a 
FSL with Pr, because it successfully received a resource from 
it. 

IV. PROSA SIMULATIONS AND RESULTS 

The main target of this work is to show that a relationships- 
inspired network naturally evolves to a small-world. Simula- 
tion results confirm that PROSA is a small-world network: it 
presents a high clustering coefficient and a small average path 
length. 

Since links between peers in PROSA are not symmetric, 
it is possible to represent a PROSA network as a directed 
graph G(V,E). The Clustering Coefficient for a node (CCn) in 
a directed graph can be defined as follows: 




Fig. 4. Query forwarding: new FSL arises 



with a strongly connected neighbourhood, which represents 
(a part of) the "semantic group" joined by the peer This 
behaviour is due to the fact that links are mainly "semantic 
links" (both FSLs and TSLs) with nodes that provided (or 
requested) resources belonging to a given field. Note also that 
the APL for a PROSA network decreases when the number 
of nodes increases, while it seems to linearly depend on the 
network size for the correspondent random graph. Note that 
the APL for PROSA is measures as the average deepness of 
a query, so it represents a very accurate estimation of the real 
APL. 

Percentage of successful! queries 



En 



(2) 



where n's neighbours are all the peers to which n as linked 
to, En.reai IS the number of edges between n's neighbours and 
En^tot is the maximum number of possible edges between n's 
neighbours. Note that if k is in the neighbourhood of n, the 
vice-versa is not guaranteed, due to the fact that links are 
directed. The clustering coefficient of a graph (CC) is defined 
as the mean graph coefficient for all the vertices (nodes) in 
the graph: 



(3) 



In figure |5] the CC and average path length (APL) of PROSA 
is compared to those of the "equivalent" random graph (rnd). 
Given a graph G(V,E), its equivalent random graph has the 
same number of nodes and edges and a random out-degree 
distribution. 

The CC and the APL of a random graph with \V\ vertices 
and \E\ edges has been computed using equations (|4|l and Q 
[6]. 
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# nodes | # edges 1 1 CC.prosa | APL_prosa 1 1 CC_md | ALPjnd 1 1 CC_prosa/CC_md | 



Fig. 5. Clustering coefficients and APL for different network size 

These measures regard the case of PROSA networks where 
each peer starts with 20 documents on average. The CC and 
APL are computed after 10.000 queries. Each query contains 
4 terms, on average. A query is considered "successfully" 
when at least one matching document is found. The maximum 
number of required document is 5. 

Looking at the results, it is clear that PROSA networks 
always present a higher clustering coefficient than the corre- 
sponding random graphs. This means that each peer is linked 
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Fig. 6. 

To evaluate the efficiency of PROSA we compared it with 
a pure flooding network and a random walk network. In a 
flooding network queries are routed using a classical flooding 
algorithm; in a random walk network query are forwarded 
through randomly chosen links. Figure |6] shows the rate of 
successful queries for PROSA, a pure flooding network and a 
random-walk network. The highest percentage is (obviously!) 
obtained with a pure flooding, because in that case the most 
part of the network is visited, and if matching resources do 
exist, they will eventually found. PROSA results to be more 
efficient than a random walk for different network sizes. This 
is due mainly to the fact that peers in PROSA do not forward 
queries at random, but using the algorithm described in Section 
IIII-B.2I If a peer does not have relevant documents for a 
given query, it forwards the query to one of the peers it links 
to, choosing the one that has the best "relevance" with the 
query. This way the query is routed to those peers that can 
(probably) answer it. Note that the efficiency of PROSA with 
respect to the percentage of successful queries is related to the 
number of query performed by peers, since semantic-links are 
a side-effect of searching and retrieving resources. We obtain 
a decreasing percentage of successful queries when network 
size grows, because the total number of queries is the same 
for all simulation reported. 

In figure |7] we show the average number of different 
links visited for a successful query, both for PROSA and 
for a pure random-walk sear chU Using PROSA we obtain 
a smaller average amount of walks for successful query than 
that obtained with a pure random search. We can explain this 
fact as a consequence of both the query routing algorithm 

'Results for pure flooding are not reported, since they are from 100 to 150 
times larger that those of PROSA and random walk 
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Fig. 7. 



Fig. 9. 



used by PROSA and the link updating policy. In PROSA new 
links among similar peers arise almost naturally; a new FSL 
is established for every document gained by a peer as a query 
result. Since a peer interested into a particular field usually 
makes query for resources in that field, the higher the number 
of queries performed, the higher the number of new FSL to 
a specific semantic group. After a (small) amount of queries, 
a peer results to be strongly connected to other peers in the 
same group. New queries will be directly forwarded to the 
best-matching group, in a small number of steps. 

Average # of retrieved documents per query 
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• The PROSA links management algorithm allows similar 
peers to form high connected clusters. This fact allows 
queries to be answered faster and more efficiently § 

V. Future works 

In this paper a novel P2P self-organising algorithm for 
resource searching and retrieving has been presented. The 
algorithm emulates the way social relationships among people 
naturally arise and evolve, and finally produces a really small- 
world network topology, as confirmed by simulation results. 
PROSA is a valid alternative to actual P2P structures based 
on simple flooding or random walk. In fact similar peers in 
PROSA form strong interconnected "semantic-groups", allow- 
ing fast and efficient query routing. Future work will focus 
on extending PROSA in order to include other mechanisms 
typically found in social communities, such as: 

« Random meetings among peers (that allows peer to 
connect, via ALs, also to non-similar peers) 

• Advertising of new resources 

• Semantic-Links scoring, to simulate various possible 
degrees of acquaintance among people 



Fig. 8. 

In figure [8] we show the number of average retrieved doc- 
uments per query for PROSA, a pure flooding-based search 
(such as that implemented in the first version of Gnutella) 
and a simple random-walk. The average number of documents 
retrieved by PROSA is always higher than that obtained with 
a dummy random-walk. This is due to the fact that forwarding 
query at random does not guarantees resources to be found. 
On the other hand, if a query is forwarded to a "relevant" peer 
(i.e. a peer that contains documents that match the query), it 
is highly probable to obtain a success. 

Finally, figure |9] shows the average query deepness, i.e. the 
average number of hops needed to satisfy a query. It is clear 
that the average query deepness for PROSA is heavily lower 
than in the case of a pure flooding or a random-walk. 

All the considerations above lead to these conclusions: 

• The PROSA routing algorithm is fast and efficient. Rout- 
ing queries in the direction of the semantic group that 
can satisfy them is a winning strategy. 
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